Exploration-Exploitation Strategies in Deep Q-Networks Applied to Route-Finding Problems
نویسندگان
چکیده
منابع مشابه
Efficient Exploration through Bayesian Deep Q-Networks
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...
متن کاملRegular Algebra Applied to Path - finding Problems
In an earlier paper, one of the authors presented an algebra for formulating and solving extremal path problems. There are striking similarities between that algebra and the algebra of regular languages, which lead one to consider whether the previous results can be generalized—for instance to path enumeration problems—and whether the algebra of regular languages can itself be profitably used f...
متن کاملDeep Abstract Q-Networks
We examine the problem of learning and planning on highdimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma’s Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 19...
متن کاملHuman and Optimal Exploration and Exploitation in Bandit Problems
We consider a class of bandit problems in which a decision-maker must choose between a set of alternativeseach of which has a fixed but unknown rate of rewardto maximize their total number of rewards over a short sequence of trials. Solving these problems requires balancing the need to search for highly-rewarding alternatives with the need to capitalize on those alternatives already known to be...
متن کاملThe Exploration-Exploitation Tradeoff in Sequential Decision Making Problems
Sequential decision making problems often require an agent to act in an environment where data is noisy or not fully observed. The agent will have to learn how different actions relate to different rewards, and must therefore balance the need to explore and exploit in an effective strategy. In this report, sequential decision making problems are considered through extensions of the multi-armed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Physics: Conference Series
سال: 2020
ISSN: 1742-6588,1742-6596
DOI: 10.1088/1742-6596/1684/1/012073